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Abstract — Over the last two decades, fixed coefficient FIR 
filters were generally optimized by minimizing the number of 
adders required to implement the multiplier block in the 
transposed direct form filter structure. In this paper, an 
optimization method for the structural adders in the transposed 
tapped delay line is proposed. Although additional registers are 
required, an optimal trade-off can be made such that the overall 
combinational logic is reduced. For a majority of taps, the delay 
through the structural adder is shortened except for the last tap. 
The one full adder delay increase for the last optimized tap is 
tolerable as it does not fall in the critical path in most cases. The 
criterion for which area reduction up to 4.5% to 9.5% and 
power reduction up to 10% to 30% for the structural adder 
block of three benchmarks filters is estimated theoretically. The 
saving is more prominent as the number of taps grows. The 
criterion for which reduction in number LUTs, number of 
bonded IOBs, & number of slices are derived. Actual synthesis 
results are obtained by Xilinx design ISE suite 14.3 (Sparten 3E 
family and device-XC3S100) & Cadence RTL compiler with 
0.18pm TSMC CMOS libraries. 

Index Terms — FIR filter, Normal structural adder, 
proposed structural adder reduction, Xilinx design ISE suite 
14.3 (Sparten 3E family and device-XC3S100) and Cadence 
RTL compiler with, 0.18pm TSMC CMOS libraries, Area & 
Power reduction. 

I. Introduction 

The inherit stability makes FIR filters a preferred choice in 
digital signal processing. As wireless technology advances, 
FIR filters with shorter transition bands, more stringent 
stopband attenuation requirement and higher sampling rate, 
are in great demand. To achieve these goals, ASIC 
implementation is necessary. The Transposed Direct Form 
(TDF) structure is preferred over direct form structure for 
higher order ASIC filters due to its shorter critical path delay. 
In the direct form structure, the input is delayed before the 
coefficient multiplication and the register length of each tap is 
fixed by the input bit width. In the TDF structure, the partial 
sums generated by the outputs of the coefficient multiplier, 
are delayed. Thus, the lengths of the registers increase 
monotonically along the taps to hold the correct precision of 
the partial sums. Consequently, the number of registers 
needed for the TDF structure is larger than that for the direct 
form. 

Fig. 1 shows a generic TDF fixed coefficient FIR filter. For 
long filters, the shorter critical path of the TDF is more 
significant than the costs of the registers. 
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Fig. 1 Transposed direct form FIR Filter. 

For fixed coefficient FIR filters, the bit widths of the input 
and all coefficients are known. This enables the bit width of 
the coefficient multiplier to be determined from its dynamic 
range. As the partial sums are delayed before they are added 
with the coefficient multiplier outputs in the structural adders, 
the bit widths of the structural adders increase monotonically 
from the first structural adder towards the output. Careful 
analysis revealed that for most filters, the bit width of the 
adder increases only from coefficient N-l to about N/2, after 
which the bit width stays relatively constant and increases by 
no more than two bits. As the bit width of the coefficient 
multiplier output reduces towards the last tap, longer sign 
extension is required for these structural adders. This paper 
proposes an addition scheme to reduce the bit widths of these 
structural adders so that the total combinational logic is 
reduced at the expense of some register overhead. To 
determine if the area reduction is able to offset the overheads 
of additional adders and registers, a lower bound for the 
difference between the adder bit width and the coefficient 
multiplier output bit width is established analytically. 

II. PROPOSED STRUCTURAL ADDER OPTIMIZATION 

The fundamental concept of our proposed method can be 
illustrated by an example in decimal. Let {610,-274, 2, 258} 
be a set of coefficient multiplier outputs to be accumulated to 
a large partial sum 1234567 by the structural adders in a 
tapped delay line. A downright approach is to add one number 
at a time from the set of smaller integers to the large integer. 
Alternatively, the integers in the set are summed and then 
added to the large integer. The latter accumulation scheme, 
when implemented in hardware, requires the large integer and 
the smaller integers to be stored at each tap. This incurs a 
large register overhead, which can be reduced if the large 
number is split into two smaller integers as shown in Fig. 2. 
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1235163 


Fig. 2 Example of adders size reductions for decimal number 

accumulation. 


By partitioning the large integer into two halves, the 
register overhead is greatly reduced as only the fourth 
overlapping digit has to be saved twice. The additional adder 
at the last step needs only a four-digit addition as the three 
least significant digits are all zeros. Besides, the reduction of 
the dynamic ranges of the operands also simplifies the 
structural adder implementation and reduced the length of 
sign extension. 
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Fig. 3 Binary example on the optimization of last three adders 

of Fig. 2. 

Fig. 3 shows the binary implementation of the proposed 
scheme on the last three coefficients of the filter example from 
Fig. 2. The reduction of adder lengths is observed to be 
several times more than the register and adder overheads it 
incurred. Furthermore, the delays through the structural 
adders, a2 and al have been reduced, while the delay through 
aO is increased by one Full adder delay. The slight increase in 
the delay through aO is not an issue as in most cases, there 
exists at least one tap (i >0) for which delay(x-cO) < 
delay (x-ci). The full adder reduction for the structural adders 
can be offset by the increase in flip-flop overhead. Therefore, 
information about the minimal difference between the 
addends of the structural adders is of interest. 


III. NORMAL & PROPOSED ADDERS 
IMPLEMENTATION: 

Take partial sum is large like 1234567 & coefficients are 
{610, 214, 306, 3} of the TDF FIR filters. Now solution is 
given by the below method Fig. 4. 


Normal adders method: Proposed adders method: 
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Fig. 4 Proposed & Normal adder method 


A. RTL Schematic of adders 



Fig. 5 RTL Schematic of proposed & normal adders 

The fig. 5 is shows the RTL schematic of proposed & Normal 
adders. The simulation output for the above adder method is 
shown in below: 


B. Simulation output of adder 



Fig. 6 Simulation output of normal & proposed adder 

C. Simulation report of normal & proposed adder: 

The Simulation Report of the above methods is given by 
Xilinx design ISE suite 14.3 with Spartan 3E family is shown 
in Table 1. 

TABLE 1 

Number LUTs, Number of bonded IOBs, & Number of slices 
Calculation 


Logic 

utilization 

Used 

Available 

Utilization 


Normal 

Proposed 


Normal 

Propose 

d 

Number 
of slices 

54 

33 

2448 

2% 

1% 

Number 
of LUTs 

103 

60 

4896 

2% 

1% 

Number 
of bonded 
IOBs 

104 

96 

108 

96% 

88% 
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IV. TRANSPOSED DIRECT FORM FIR FILTER (TDF) 
BY USING NORMAL & PROPOSED ADDER METHOD: 

A. RTL Schematic of TDF FIR filter 

The TDF FIR filter RTL Schematic is shown in below fig. 7. 


The Simulation Report of the above methods is given by 
Xilinx design ISE suite 14.3 with Spartan 3E family is shown in 
Table 2. 


TABLE 2 


Number LUTs, Number of bonded IOBs, & Number of slices 
Calculation of TDF FIR filter 



Fig. 7 RTL Schematic of TDF FIR filter 

B. Simulation output of TDF FIR filter 

1. Simulation output of TDF FIR filter by using normal 
adder method: 


Logic 

utilizatio 

n 

Used 

Available 

Utilization 


Normal 

Proposed 


Normal 

Proposed 

Number 
of slices 

247 

81 

2448 

16% 

4% 

Number 
of LUTs 

124 

40 

4896 

6% 

2% 

Number 
of bonded 
IOBs 

100 

36 

108 

151% 

54% 


TABLE 3 

Area & Power Calculation of TDF FIR Filter 


Area analysis(micro-meter) 

Power Analysis(nWatts) 

Normal 

Proposed 

Normal 

Proposed 

11249 

5430 

405235.5 

99615.7 


V. IMPLEMENTATION RESULTS 



Fig. 8 Simulation output of TDF FIR filter by using normal 

adder method 

2. Simulation output of TDF FIR filter by using proposed 
adder method: 
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Fig. 9 Simulation output of TDF FIR filter by using proposed 

adder method 


The fig. 6 is shows the simulation output of normal and 
proposed adder and this results is verified by Xilinx design 
ISE suite 14.3 (Sparten 3E family & device-XC3S100). In 
Table 1 comparison between normal & proposed adder is 
shown. The fig. 8 and fig. 9 is shows the simulation output of 
TDF FIR Filters by normal method and proposed method. 
This result is verified by Xilinx design ISE suite 14.3 with 
Spartan 3E family and area and power analysis is verified by 
Cadence RTL compiler with 0.18pm TSMC CMOS libraries. 


VI. CONCLUSION 

This paper presents a new method to reduce the total area and 
power of fixed coefficient transposed direct form FIR with a 
large number of taps by minimizing the bit widths of the 
structural adders. Sign extensions have been shortened and 
the delays through the structural adders have been reduced at 
the expense of some register overhead and a reduced size 
merged adder for each bisection of a long partial sum. 
Theoretical estimate shows an area reduction of up to 4.5% to 
9% and power reduction up to 10% to 30% for the structural 
adder’s block of the benchmark filters. 
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